nnLB: Next Pitch
A Novel Deep Learning Approach to MLB Pitch Prediction
Introduction
Organizations across all major sports leagues have adopted data-driven decision-making approaches to remain current and competitive in recent decades. Among these leagues, Major League Baseball (MLB) is widely recognized as the pioneer in embracing analytics. In fact, an entire domain of sports-based analytics, termed sabermetrics, is devoted to baseball-specific statistics and analysis. Consequently, a wealth of high-resolution public data and untapped opportunities exist within the world of baseball.
Baseball enthusiasts would agree that success within the sport relies heavily on the game within the game. Identifying and exploiting small advantages can yield significant returns in achieving desired outcomes. Here, we introduce a deep learning method that utilizes in-game video footage to predict pitches. This endeavor is motivated by two factors. First, a reliable pitch classifier can provide batters with an edge during live at-bats. Second, an interpretable deep learning model can give pitchers insight into how predictable they are and how they can conceal their pitches more effectively.
Methods
Defining the Sample
Prediction tasks must be segmented by pitcher since pitchers have unique motions, tendencies, and pitching arsenals (i.e., pitchers throw different types of pitches). As such, we decided to focus our proof-of-concept analysis on the two pitchers: Tyler Glasnow (2019 regular season) and Walker Buehler (2021 regular season). Further, most pitchers pitch from two separate positions (the windup and the stretch), which is conditional on game situation. We decided to focus on pitches thrown from the stretch since the motion is more compact.
Data Collection
I. Web Scraping
BaseballSavant is a website dedicated to providing the public with access to historical MLB data. These data include video footage and Statcast tabular data for every pitch thrown in the MLB since 2018 and 2015, respectively. We built web scrapers to retrieve both the video source URL and pitch type for every pitch of interest. Video source URLs were used as inputs for our feature extraction pipeline.
II. Feature Extraction
A highlight in our work is the feature extraction process, termed the Video2Data pipeline. The pipeline works as follows. First, a video is downloaded from the source URL and is converted to a series of images (or frames). Second, an object detection model is used to determine the location of the pitcher in each frame. The coordinates reported by the model are subsequently used to blur the background of each image. The object detection model used in this step is a custom Detectron2 model (Faster R-CNN) that was trained on a self-annotated dataset to specifically detect pitchers. This step is necessary for scalable and reliable feature extraction since the OpenPose pose estimation software (used in the following step) detects humans non-specifically. Third, the OpenPose pose estimation software is used on each image to extract the coordinates of 25 keypoints on the pitcher’s body. Keypoint coordinates from each frame are finally merged into a similar data structure. Example outputs generated at each step of the Video2Data pipeline are shown in Figure 1.
Figure 1. Example Video2Data pipeline outputs. (1) Video to image conversion (left). (2) Pitcher detection and background blurring (middle). (3) OpenPose pose estimation (right).